Stochastic Semi-supervised Learning
نویسندگان
چکیده
In this paper, we describe the stochastic semi-supervised learning approach that we used in our submission to all six tasks in 2009-2010 Active Learning Challenge. The method is designed to tackle the binary classification problem under the condition that the number of labeled data points is extremely small and the two classes are highly imbalanced. It starts with only one positive seed given by the contest organizer. We randomly pick additional unlabeled data points and treat them as “negative” seeds based on the fact that the positive label is rare across all datasets. A classifier is trained using the “labeled” data points and then is used to predict the unlabeled dataset. We take the final result to be the average of n stochastic iterations. Supervised learning was used as a large number of labels were purchased. Our approach is shown to work well in 5 out of 6 datasets. The overall results ranked 3rd in the contest.
منابع مشابه
Semi-Supervised Convex Training for Dependency Parsing
We present a novel semi-supervised training algorithm for learning dependency parsers. By combining a supervised large margin loss with an unsupervised least squares loss, a discriminative, convex, semi-supervised learning algorithm can be obtained that is applicable to large-scale problems. To demonstrate the benefits of this approach, we apply the technique to learning dependency parsers from...
متن کاملSemi-supervised Learning with Sparse Autoencoders in Phone Classification
We propose the application of a semi-supervised learning method to improve the performance of acoustic modelling for automatic speech recognition based on deep neural networks. As opposed to unsupervised initialisation followed by supervised fine tuning, our method takes advantage of both unlabelled and labelled data simultaneously through minibatch stochastic gradient descent. We tested the me...
متن کاملEfficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs
We describe a computationally efficient, stochastic graph-regularization technique that can be utilized for the semi-supervised training of deep neural networks in a parallel or distributed setting. We utilize a technique, first described in [13] for the construction of mini-batches for stochastic gradient descent (SGD) based on synthesized partitions of an affinity graph that are consistent wi...
متن کاملSparse Autoencoder Based Semi-Supervised Learning for Phone Classification with Limited Annotations
We propose the application of a semi-supervised learning method to improve the performance of acoustic modelling for automatic speech recognition with limited linguistically annotated material. Our method combines sparse autoencoders with feed-forward networks, thus taking advantage of both unlabelled and labelled data simultaneously through mini-batch stochastic gradient descent. We tested the...
متن کاملA Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
We consider the situation in semi-supervised learning, where the “label sampling” mechanism stochastically depends on the true response (as well as potentially on the features). We suggest a method of moments for estimating this stochastic dependence using the unlabeled data. This is potentially useful for two distinct purposes: a. As an input to a supervised learning procedure which can be use...
متن کاملLarge Scale Semi - supervised Linear SVM with Stochastic Gradient Descent ⋆
Semi-supervised learning tries to employ a large collection of unlabeled data and a few labeled examples for improving generalization performance, which has been proved meaningful in real-world applications. The bottleneck of exiting semi-supervised approaches lies in over long training time due to the large scale unlabeled data. In this article we introduce a novel method for semi-supervised l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011